Search CORE

24 research outputs found

Critical assessment of protein intrinsic disorder prediction.

Author: CAID Predictors .
DisProt Curators .
Necci M
Piovesan D
Tosatto SCE
Publication venue
Publication date: 01/05/2021
Field of study

Intrinsically disordered proteins, defying the traditional protein structure-function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude

UCL Discovery

Lessons from the CAGI-4 Hopkins clinical panel challenge

Author: Adhikari A
Buckley BA
Carraro M
Chandonia J-M
Chhibber A
Cutting GR
Fu Y
Gasparini A
Jones DT
Kramer A
Kundu K
Lam HYK
Leonardi E
Moult J
Pal LR
Searls DB
Shah S
Sunyaev S
Tosatto SCE
Yin Y
Publication venue
Publication date: 01/01/2017
Field of study

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state of the art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of fourteen possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other groups. We discuss the causal variant predictions by the different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false positive rate of DNA-guided analysis in the absence of prior phenotypic indication. This article is protected by copyright. All rights reserved

UCL Discovery

eScholarship - University of California

Archivio istituzionale della ricerca - Università di Padova

Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction

Author: A Zemla
AG Murzin
B Park
B Rost
B Wallner
BA Reva
BH Park
C Keasar
CH Wu
Ching-Wai Tan
CS Pettitt
D Eramian
D Shortle
David T Jones
DT Jones
DT Jones
DT Jones
J Moult
J Tsai
KT Simons
LJ McGuffin
M Fasnacht
M Wiederstein
MI Sadowski
MJ Sippl
N Siew
R Samudrala
R Samudrala
SCE Tosatto
SF Altschul
W Kabsch
Y Xia
Y Zhang
Y Zhang
Publication venue: BioMed Central
Publication date: 01/02/2008
Field of study

Background: We present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks.Results: Results have shown that the best performing neural network is the one that uses input information comprising of PSI-BLAST [1] profiles of residue pairs, pairwise distance and the relative solvent accessibilities of the residues. This neural network is the best among all methods tested in discriminating the native structure from a set of decoys for all decoy datasets tested. Conclusion: This method is demonstrated to be viable, and furthermore evolutionary information is successfully used in the neural networks to improve decoy discrimination

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

The InterPro protein families and domains database: 20 years on

Author: Bateman A
Blum M
Bork P
Bridge A
Chang H-Y
Chuguransky S
Finn RD
Gough J
Grego T
Haft DH
Kandasaamy S
Letunic I
Marchler-Bauer A
Mi H
Mitchell A
Natale DA
Necci M
Nuka G
Orengo CA
Pandurangan AP
Paysan-Lafosse T
Qureshi M
Raj S
Richardson L
Rivoire C
Salazar GA
Sigrist CJA
Sillitoe I
Thanki N
Thomas PD
Tosatto SCE
Williams L
Wu CH
Publication venue
Publication date: 06/11/2020
Field of study

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan

UCL Discovery

TAP score: torsion angle propensity normalization applied to local protein structure evaluation

Author: A Albiero
A Andreeva
A Fiser
AD MacKerell
AT Brunger
C Colovos
CA Rohl
CI Branden
D Shortle
DA Pearlman
DT Jones
DW Cruickshank
F Melo
G Chikenji
GE Sims
GJ Kleywegt
GJ Kleywegt
GJ Kleywegt
GN Ramachandran
J Higo
JM Chandonia
L Esposito
MA DePristo
MA Wilson
MJ Bower
MJ Sippl
MJ Sippl
MV Shapovalov
N Deshpande
Q Fang
R Laskowski
R Luthy
RA Laskowski
RJ Read
RL Dunbrack Jr.
Roberto Battistutta
RW Hooft
RW Hooft
S Miyazawa
SC Lovell
SC Tosatto
SCE Tosatto
Silvio CE Tosatto
TJ Oldfield
V Luzzati
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

Author: Attwood TK
Babbitt PC
Blum M
Bork P
Bridge A
Brown SD
Chang H-Y
El-Gebali S
Finn RD
Fraser MI
Gough J
Haft DR
Huang H
Letunic I
Lopez R
Luciani A
Madeira F
Marchler-Bauer A
Mi H
Mitchell AL
Natale DA
Necci M
Nuka G
Orengo C
Pandurangan AP
Paysan-Lafosse T
Pesseat S
Potter SC
Qureshi MA
Rawlings ND
Redaschi N
Richardson LJ
Rivoire C
Salazar GA
Sangrador-Vegas A
Sigrist CJA
Sillitoe I
Sutton GG
Thanki N
Thomas PD
Tosatto SCE
Yong S-Y
Publication venue
Publication date: 06/11/2018
Field of study

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities

UCL Discovery

Identification and structural characterization of FYVE domain-containing proteins of Arabidopsis thaliana

Author: A Bairoch
A Hayakawa
A Marchler-Bauer
A Marchler-Bauer
A Nicholls
A Otomo
A Petiot
A Sali
A Shisheva
A Shisheva
A Simonsen
A Simonsen
AA Canutescu
AC Rutherford
AC Wallace
AE Wurmser
AGI
B Boeckmann
B Contreras-Moreira
B Heras
B Mueller-Roeber
B Wallner
BK Drobak
BK Shoichet
BY Qin
CA Brearley
CG Burd
D Bashford
D Petrey
D Petrey
D Sbrissa
D Tobi
DC Soares
DH Kim
E Panopoulou
E Psachoulia
EC Meng
EF Pettersen
Ewa Wywial
F Itoh
FM Richards
FT Cooke
FT Cooke
G Odorizzi
H Stenmark
H Stenmark
H Stenmark
H Stenmark
HJG Meijer
HJG Meijer
I Chaudhuri
ID Kuntz
J Callaghan
J He
J Meller
J Myers
J Schultz
J Schultz
JD Gary
JH Joo
JJ Dumas
JL Rosa
JM Gaullier
JM Gaullier
JU Bowie
JY Jung
K Diraviyam
K Nicholas
L Estrada
L Renault
L Zonia
LF Seet
LF Seet
LF Seet
M Kanehisa
M Kanehisa
MA Marti-Renom
MJ Sippl
MP Jacobson
N Mirkovic
NR Blatner
O Hadjebi
O Lund
O Teodorescu
P Welters
P Whitley
R Luethy
RB Jensen
RL Tatusov
RL Tatusov
RM Bennett-Lovsey
RV Stahelin
S Corvera
S Misra
S Peleg-Grossman
SCE Tosatto
SF Altschul
SH Ridley
Shaneen M Singh
SK Dove
SM Singh
T Kutateladze
T Munnik
T Munnik
T Tsukazaki
TE Ferrin
TG Kutateladze
TG Kutateladze
TG Kutateladze
TJ Hsieh
V Patki
V Patki
V Sobolev
W van Leeuwen
Y Lee
Y Leshem
Y Mao
Z Hong
Z Xiang
Z Xiang
Z Xiang
Publication venue: BioMed Central
Publication date: 01/08/2010
Field of study

Abstract Background FYVE domains have emerged as membrane-targeting domains highly specific for phosphatidylinositol 3-phosphate (PtdIns(3)<it>P</it>). They are predominantly found in proteins involved in various trafficking pathways. Although FYVE domains may function as individual modules, dimers or in partnership with other proteins, structurally, all FYVE domains share a fold comprising two small characteristic double-stranded β-sheets, and a C-terminal α-helix, which houses eight conserved Zn2+ ion-binding cysteines. To date, the structural, biochemical, and biophysical mechanisms for subcellular targeting of FYVE domains for proteins from various model organisms have been worked out but plant FYVE domains remain noticeably under-investigated. Results We carried out an extensive examination of all <it>Arabidopsis </it>FYVE domains, including their identification, classification, molecular modeling and biophysical characterization using computational approaches. Our classification of fifteen <it>Arabidopsis </it>FYVE proteins at the outset reveals unique domain architectures for FYVE containing proteins, which are not paralleled in other organisms. Detailed sequence analysis and biophysical characterization of the structural models are used to predict membrane interaction mechanisms previously described for other FYVE domains and their subtle variations as well as novel mechanisms that seem to be specific to plants. Conclusions Our study contributes to the understanding of the molecular basis of FYVE-based membrane targeting in plants on a genomic scale. The results show that FYVE domain containing proteins in plants have evolved to incorporate significant differences from those in other organisms implying that they play a unique role in plant signaling pathways and/or play similar/parallel roles in signaling to other organisms but use different protein players/signaling mechanisms.</p

City University of New York

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Author: Altman RB
Azaria JR
Babbi G
Bachar E
Bhat RR
Bovo S
Brenner SE
Bromberg Y
Carraro M
Casadio R
Chang B
Daneshjou R
Di Lena P
Edwards M
Ferrari C
Franke A
Gasparini A
Gifford D
Giollo M
Hoskins RA
Jiang Y
Jones DT
Klein TE
Kundu K
Leonardi E
Li B
Li X
Martelli PL
McCombie R
Mooney SD
Morgan AA
Moult J
Niroula A
Ofran Y
Pagel KA
Pal LR
Pejaver V
Petersen B-S
Pirooznia M
Potash JB
Radivojac P
Repo S
Shah S
Sundaram L
Tosatto SCE
Unger R
Vihinen M
Wang MH
Wang Y
Yin Y
Zandi P
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/09/2017
Field of study

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype–phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype–phenotype relationships

UCL Discovery

PDBe-KB: collaboratively defining the biological context of structural data

Author: Al-Lazikani B
Andreini C
Anyango S
Armstrong D
Barton GJ
Bednar D
Berka K
Berrisford J
Blundell T
Brock KP
Carazo JM
Choudhary P
Damborsky J
David A
Deshpande M
Dey S
Dunbrack R
Fraternali F
Gibson T
Helmer-Citterich M
Hoksza D
Hopf T
Jakubec D
Kannan N
Krivak R
Kumar M
Levy ED
London N
Macias JR
Marks DS
Martens L
McGowan SA
McGreig JE
Modi V
Nadzirin N
Nair SS
Orengo C
Parra RG
Pepe G
Piovesan D
Pravda L
Prilusky J
Putignano V
Radusky LG
Ramasamy P
Rausch AO
Recio JF
Reuter N
Rodriguez LA
Rollins NJ
Rosato A
Rubach P
Serrano L
Singh G
Skoda P
Sorzano COS
Srivatsan MM
Sternberg M
Stourac J
Sulkowska JI
Svobodova R
Tanweer A
Thornton J
Tichshenko N
Tosatto SCE
Varadi M
Velankar S
Vranken W
Wass MN
Xue D
Zaidman D
Publication venue: 'Oxford University Press (OUP)'
Publication date: 14/10/2021
Field of study

The Protein Data Bank in Europe – Knowledge Base (PDBe-KB, https://pdbe-kb.org) is an open collaboration between world-leading specialist data resources contributing functional and biophysical annotations derived from or relevant to the Protein Data Bank (PDB). The goal of PDBe-KB is to place macromolecular structure data in their biological context by developing standardised data exchange formats and integrating functional annotations from the contributing partner resources into a knowledge graph that can provide valuable biological insights. Since we described PDBe-KB in 2019, there have been significant improvements in the variety of available annotation data sets and user functionality. Here, we provide an overview of the consortium, highlighting the addition of annotations such as predicted covalent binders, phosphorylation sites, effects of mutations on the protein structure and energetic local frustration. In addition, we describe a library of reusable web-based visualisation components and introduce new features such as a bulk download data service and a novel superposition service that generates clusters of superposed protein chains weekly for the whole PDB archive

Spiral - Imperial College Digital Repository

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Author: Almeida-e-Silva DC
Altenhoff A
Babbitt PC
Bankapur AR
Bargsten JW
Ben-Hur A
Benso A
Bhat P
Bonneau R
Brenner SE
Bryson K
Cao RZ
Casadio R
Cejuela JM
Chapman S
Chen CT
Cheng JL
Cibrian-Uhalte E
Clark WT
Cozzetto D
D'Andrea D
Das S
Dawson NL
del Pozo A
Denny P
Dessimoz C
Di Carlo S
Dogan T
Dukka BKC
ElShal S
Falda M
Fang H
Feng S
Fernandez JM
Ferrari C
Fontana P
Foulger RE
Friedberg I
Funk CS
Gabaldon T
Gemovic B
Gillis J
Ginter F
Giollo M
Glisic S
Goldberg T
Gong QT
Gough J
Greene CS
Hakala K
Hamp T
Hieta R
Holm L
Hsu WL
Huntley RP
Jiang YX
Jones DT
Kaewphan S
Kahanda I
Kansakar L
Khan IK
Kihara D
Koo DCE
Koskinen P
Lavezzo E
Lee D
Lees JG
Legge D
Lepore R
Li B
Lin A
Linial M
Lovering RC
Magrane M
Maietta P
Marcet-Houben M
Martelli PL
Martin MJ
Mehryary F
Melidoni AN
Mesiti M
Minneci F
Mooney SD
Moreau Y
Mutowo-Meullenet P
Nepusz T
Ning W
O'Donovan C
Oates M
Ofer D
Orengo CA
Oron TR
Paccanaro A
Pavlidis P
Penfold-Brown D
Perovic V
Pichler K
Piovesan D
Politano G
Profiti G
Radivojac P
Rappoport N
Re M
Rehman HU
Richter L
Robinson PN
Romero AE
Rost B
Sahraeian SME
Salakoski T
Salamov A
Sasidharan R
Savino A
Sedeno-Cortes AE
Sharan M
Shasha D
Shypitsyna A
Sillitoe I
Skunca N
Smithers B
Stern A
Sternberg MJE
Supek F
Tian WD
Toppo S
Toronen P
Tosatto SCE
Tramontano A
Tranchevent LC
Tress ML
Valencia A
Valentini G
van Dijk ADJ
Veljkovic N
Veljkovic V
Vencio RZN
Verspoor KM
Vogel J
Vucetic S
Wang Z
Wass MN
Yang HX
Youngs N
Zakeri P
Zhang S
Zhong Z
Zhou YP
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent

UTUPub